63 research outputs found

    Automatic recognition of fingerspelled words in British Sign Language

    Get PDF
    We investigate the problem of recognizing words from video, fingerspelled using the British Sign Language (BSL) fingerspelling alphabet. This is a challenging task since the BSL alphabet involves both hands occluding each other, and contains signs which are ambiguous from the observer’s viewpoint. The main contributions of our work include: (i) recognition based on hand shape alone, not requiring motion cues; (ii) robust visual features for hand shape recognition; (iii) scalability to large lexicon recognition with no re-training. We report results on a dataset of 1,000 low quality webcam videos of 100 words. The proposed method achieves a word recognition accuracy of 98.9%

    Sign Language Recognition

    Get PDF
    This chapter covers the key aspects of sign-language recognition (SLR), starting with a brief introduction to the motivations and requirements, followed by a précis of sign linguistics and their impact on the field. The types of data available and the relative merits are explored allowing examination of the features which can be extracted. Classifying the manual aspects of sign (similar to gestures) is then discussed from a tracking and non-tracking viewpoint before summarising some of the approaches to the non-manual aspects of sign languages. Methods for combining the sign classification results into full SLR are given showing the progression towards speech recognition techniques and the further adaptations required for the sign specific case. Finally the current frontiers are discussed and the recent research presented. This covers the task of continuous sign recognition, the work towards true signer independence, how to effectively combine the different modalities of sign, making use of the current linguistic research and adapting to larger more noisy data set

    Online Kernel Slow Feature Analysis for Temporal Video Segmentation and Tracking

    Get PDF

    Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging

    No full text
    Abstract This paper describes an approach to scene analysis based on supervised training of 2D Long Short-Term Memory recurrent neural networks (LSTM networks). Unlike previous methods, our approach requires no manual construction of feature hierarchies or incorporation of other prior knowledge. Rather, like deep learning approaches using convolutional networks, our recognition networks are trained directly on raw pixel values. However, in contrast to convolutional neural networks, our approach uses 2D LSTM networks at all levels. Our networks yield per pixel mid-level classifications of input images; since training data for such applications is not available in large numbers, we describe an approach to generating artificial training data, and then evaluate the trained networks on real-world images. Our approach performed significantly better than others methods including Convolutional Neural Networks (ConvNet), yet using two orders of magnitude fewer parameters. We further show the experiment on a recently published dataset, outdoor scene attribute dataset for fair comparisons of scene attribute learning which had significant performance improvement (ca. 21%). Finally, our approach is successfully applied on a real-world application, automatic web-image tagging

    Combination of Multiple Aligned Recognition Outputs using WFST and LSTM

    No full text
    The contribution of this paper is a new strategy of integrating multiple recognition outputs of diverse recognizers. Such an integration can give higher performance and more accurate outputs than a single recognition system. The problem of aligning various Optical Character Recognition (OCR) results lies in the difficulties to find the correspondence on character, word, line, and page level. These difficulties arise from segmentation and recognition errors which are produced by the OCRs. Therefore, alignment techniques are required for synchronizing the outputs in order to compare them. Most existing approaches fail when the same error occurs in the multiple OCRs. If the corrections do not appear in one of the OCR approaches are unable to improve the results.We design a Line-to-Page alignment with edit rules using Weighted Finite-State Transducers (WFST). These edit rules are based on edit operations: insertion, deletion, and substitution. Therefore, an approach is designed using Recurrent Neural Networks with Long Short-Term Memory (LSTM) to predict these types of errors. A Character-Epsilon alignment is designed to normalize the size of the strings for the LSTM alignment. The LSTM returns best voting, especially when the heuristic approaches are unable to vote among various OCR engines. LSTM predicts the correct characters, even if the OCR could not produce the characters in the outputs. The approaches are evaluated on OCR’s output from the UWIII and historical German Fraktur dataset which are obtained from state-of-the-art OCR systems. The experiments shows that the error rate of the LSTM approach has the best performance with around 0.40%, while other approaches are between 1,26% and 2,31%
    • …
    corecore